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PROCESSOR: 


ON  THE  CONTINUOUS  QOLD-MINING  EQUATION 


ON  THE  CONTINUOUS  GOLD-MININQ  EQUATION 


Richard  Bellman  and  Sherman  Lehman 
The  RAND  Corporation  and  Stanford  University 


^1.  Introduction. 


In  some  previous  communications,  Ql]  and  [2 j ,  we  have  des¬ 
cribed  some  results  obtained  In  the  Investigation  of  a  dynamic 
pTOgrammlng  problem,  the  "gold-mining"  problem,  which  led  to  the 
functional  equation 


f(x,y)  -  Max 


where 


A:  ^  Pl(^l*  +  n!(l-ri)x,y) 

,  x,y  >  0 
N 

[b»  ^  q^(s^y  ♦  f(x,(l-e^)y) 

“i'  ”1  ^  "i’  ^  ’i 


(1.1) 


<  1.  The  solution  of  this 


equation  was  given.  Inter  alia.  In  QlJ  and  shown  to  have  a  rela¬ 
tively  simple  form.  In  addition,  a  partial  solution  of  a  more  com¬ 
plicated  equation,  corresponding  to  a  nonlinear  utility  function, 
was  given  In  Q23 ,  having  the  same  form.  It  Is  known,  however,  as 
a  result  of  an  unpublished  counter-example  due  to  H.  N.  Shapiro  and 
S.  Karlin,  that  the  solution  of  more  general  equations  such  as 


f(x,y) 


A:  Pi(rxx  +  f((l-ri)x,y) 


q»  (ray  f(x,(l-r*)y) 

Pa(r2X  +  r4y  f((l-r3)x,  (l-r4)y) 


(1.2) 


0  ^  ri,ra,rs,r4  ^  1,  0  <  Pt,qi,P2  <  1  cannot  have  the  same  simple 
form  for  all  values  of  the  parameters. 

In  an  effort  to  gain  some  Insight  Into  the  structure  of  par¬ 
ticular  classes  of  solutions  of  (1.2),  and  similar  equations  of  more 
complicated  type,  we  have  been  led  to  consider  some  continuous 
analogues  of  these  equations.  There  are  many  different  procedures 
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for  obtaining  these  continuous  analogues.  One  which  we  have  fol¬ 
lowed  to  begin  with  leads  to  problems  In  the  calculus  of  variations. 

An  essential  feature  of  our  research  lies  In  viewing  a  policy 
In  Its  extensive  rather  than  normal  form,  to  borrow  the  termi¬ 
nology  of  game  theory.  Another  way  of  stating  this  Is  that  Instead 
of  determining  the  complete  solution  for  one  set  of  Initial  para¬ 
meters,  which  would  corr*espond  to  determining  the  exti*emal  curve 
In  the  classical  theory  of  the  calculus  of  variations,  we  attack 
our  problem  by  Imbedding  It  Into  the  family  of  problems  of  this 
type  with  arbitrary  Initial  parameters.  This  Is  the  approach  used 
throughout  the  theory  of  dynamic  programming,  cf.  [l]  ,  C^l »  CJI  » 

W*  [5l*  Having  done  this  we  determine  an  optimal  continuation 
from  each  position,  which  upon  being  carried  through  yields  an 
optimal  policy. 

This  approach,  which  may  be  considered  a  variant  In  prob¬ 
lems  of  deterministic  type.  Is  In  many  ways  a  necessity  In  prob¬ 
lems  of  stochastic  type.  It  Is  possible  to  treat  many  of  the  clas¬ 
sical  problems  In  the  calculus  of  variations  by  means  of  this 
technique.  We  shall  return  to  this  point  at  some  future  time. 

Guided  by  our  knowledge  of  tt-.  solution  In  the  discrete  cases, 
and  using  the  behavioristic  approach  described  above,  we  have  been 
able  to  solve  completely  and  explicitly  a  variety  of  problems  which 
are  Intractable  In  the  original  discrete  form. 

In  the  following  sections  we  shall  discuss  the  simplest 
counterpart  of  (1.2)  In  continuous  form,  and  list  a  number  of  typi¬ 
cal  results  we  have  obtained.  Following  this  we  shall  sketch 
briefly  a  formulation  of  the  more  general  continuous  version  which 
results  from  processes  corresponding  to  (l.l)  and  which  requlrea 
more  powerful  techniques. 

A  more  complete  discussion  and  proofs  of  the  results  contained 
herein  will  appear  elsewhere.  Further  results  concerning  more 
general  problems  discussed  In  [jjJ  will  be  presented  subsequently. 
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§2.  Formulation. 

In  the  formulation  of  problems  Involving  the  use  of  continu¬ 
ous  policies  we  are  immediately  faced  with  the  difficulty  of 
defining  what  we  mean  by  a  continuous  mixed  strategy,  and  of  con¬ 
structing  the  appropriate  mathematical  theory  with  which  to  handle 
this  thamy  concept.  To  circumvent  these  conceptual  and  mathe¬ 
matical  difficulties,  we  shall  utilize  an  Idea  emphasized  In  [jS]  , 
which — briefly  put — Is  that  for  mathematical  purposes,  mixing  at 
a  point  Is  to  an  arbitrary  degree  of  approximation  equivalent  to 
mixing  pure  strategies  In  an  Interval  about  the  point. 

Let  us  then  consider  a  process  where  we  are  given  two  Initial 
quantities,  x  and  y,  the  gold  mines  of  [2^  i  and  two  operations, 

A  and  B,  mining  operations.  If  A  Is  used  over  a  time  Interval  6, 
there  Is  a  probability  1  -  qiB  +  o(B)  that  r,xB  +  o(B)  is  obtained 
and  that  the  process  Is  allowed  to  continue,  with  the  new  Initial 
amounts  x  -  rixB  +  o(B),y;  and  a  probability  qiB  +  o(B)  that 
nothing  Is  obtained  and  the  process  terminates.  In  a  like  manner. 

If  B  Is  used,  there  Is  a  probability  1  -  qaB  +  o(B)  that  ra'yC  +  o(B) 
Is  obtained  and  the  process  continues;  and  a  probability  qa6  +  o(B) 
that  the  process  terminates. 

To  Introduce  the  concept  of  mixing,  we  consider  first  the  case 
In  which  the  time  Interval  Is  divided  Into  Intervals  of  length  A, 
where  A  Is  small-  In  a  typical  Interval  [^,t+^  ,  t  -  kA,  the  first 
part  of  the  Interval,  [l,t+^ii5^  will  be  devoted  to  the  use  of  A; 
while  the  second  part,  [J+^iA.t+A]  ,  will  be  devoted  to  the  uae 
of  B.  In  the  limit,  as  A — >  0,  we  obtain  the  effect  of  mixing  A 
and  B  at  t  In  the  ratio  cf.  jj6J  for  further  discussion. 

A  strategy  consists  of  a  choice  of  for  each  of  the  points 
kA.  We  wish  to  determine  the  strategy  which  will  maximize  the 
expected  value  of  the  amount  obtained  before  the  process  terminates. 
For  any  given  strategy  let 
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x(t)  •  quantity  of  gold  remaining  In  first  mine  provided 
that  the  process  has  continued  until  t, 

y(t)  •  quantity  of  gold  remaining  In  second  mine  provided 
that  the  process  has  continued  until  t, 

(2.1) 

p(t)  -  probability  that  the  process  continues  at  least 
to  t, 

f(t)  ■  expected  amount  obtained  up  to  t. 


Writing  down  the  equations  expressing  x(t-fA),  y(t+A),  p(t+A), 
f(t+A)  In  terms  of  the  values  at  t,  and  letting  A — >  0,  we  are  led 
to  the  following  system  of  differential  equations: 

-  ^i(t)rix(t),  x(o)  -  Xq, 

-  -  ^(t)ray(t).  y(o)  -  y^, 

(2.2) 

-3^  -  -  P(t)[|  ♦»(t)qi  +  ♦a(t)q*3» 

-  p(t)  Qi)!  (t)rix(t)  +  <>a(t)ray(t)  ^1  .  f(o)  -  0, 


where  0  <  ^  1 ,  ■  1  -  4>i  •  The  problem  Is  now  to  determine 

^i(t)  so  as  to  maximize  f(oo).  It  Is  not  difficult  to  give  a  proof 
based  upon,  say,  weak  convergence,  which  will  assure  us  that  the 
maximum  Is  actually  attained.  As  W.  Fleming  has  kindly  Informed 
us,  the  existence  of  a  maximum  Is  guaranteed  by  a  general  theorem  In 
the  calculus  of  variations. 

Since  the  equations  are  fortunately  nonlinear,  variational 
techniques  are  particularly  applicable.  We  find 


Theorem  1.  The  saxlmum  value  of  f(oo)  Is  attained  by  the  policy 


(a)  If  qarix  >  qir«y,  -  1, 

(b)  If  qir,y  >  q*riX,  <>«  -  1 , 

(c)  If  qarjx  -  qirsy,  -  r8/(ri+r*),  -  ri/(ri+ra). 


(2.3) 
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Note  that  the  boundary  line  la,  aa  might  be  expected,  the 
aet  of  polnta  where  expected  gain  over  expected  coat  la  the  aame 
for  both  cholcea,  A  and  B. 

I£  T  la  finite,  the  optimal  policy  haa  one  of  the 
following  alx  forma; 

(a)  A  alwaya,  (d)  A,  t^  M,  t^  A, 

(b)  B  alwaya,  (e)  b,  then  M,  then  A,  (2.4) 

(c)  M  followed  by  A  (f)  b  followed  by  A. 

Thla  la  for  Qi  <  q,;  a  almllar  reault  holda  for  qa  <  q, . 

The  preclae  Intervale  within  which  each  la  used  may  be  determined 
explicitly.  Here  M  repreeenta  the  choice  given  in  (2 .3e). 

The  optimal  atrategy  repreeenta  a  compromlae  between  the  long¬ 
term  policy  given  In  Theorem  1  and  the  ahort-term  policy  of  maxi¬ 
mizing  expected  gain. 

The  3-<jholce  problem  correaponding  to  (1.2)  haa  the  contlnuoue 
analogue : 


■  -  [I<>i(t)r,  +  4'3(t)r3^x(t), 

-  _  [3^(t)ra  ♦s(t)r4  3]y(t), 

P(t)  Q^i(t)qi  +  ^(t)qa  +  ^3(t)qs^, 


(2.5) 


•  P(t)C  (<•!  (t)r,  +  ^3(t)r3)x(t)  +  (^B(t)ra  +  ^»(t)r4)y(t)  J  , 

f(o)  -  0, 


where  0  ^  1  1,  ^i+^a+^a  -  1. 

The  maximum  value  of  f(oo)  la  provided  In  the  general  caae  by 
the  policy  repreaented  achematlcally  by 
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(2.6) 


Depending  upon  the  values  of  the  parameters,  one  line  Is  an 
absorbing  barrier,  which  Is  to  say,  for  (x,y)  on  the  line  a  mixed 
policy  Is  pursued  which  keeps  the  point  on  the  line.  This  line 
has  as  Its  equation  the  equality  of  expected  gain  over  expected 
cost.  The  other  line  will  be  a  translucent  barrier,  causing  only 
a  change  from  ■  1  to  -  1,  and  Is  not  defined  by  an  equality 
of  the  above  type.  In  special  cases  the  middle  region  disappears 
and  Li  coincides  with  Ls •  The  solution  Is  now  that  for  the  two- 
choice  case. 

This  last  result  Is  quite  surprising  and  explains  some  of  the 
difficulties  of  the  discrete  problem.  One  boundary,  the  absorbing 
barrier.  Is  determined  by  a  local  condition,  whereas  the  other  Is 
determined  by  a  global  condition. 

Theorem  5 .  If,  in  the  two-choice  problem  described  by  (2»2) ,  In  place 
of  expected  return,  we  seek  to  maximize  the  expected  value  of  some 
function  ^  of  the  total  return  where  ^  Is  any  strictly  Increasing 
function,  the  solution  Is  that  given  by  (2.3). 

To  obtain  this  result  we  consider 
oo 

G  -  -  't>(Xo  ^o  ~  -  y(t))dp(t),  (2.7) 

o 

the  quantities  being  defined  as  In  (2). 

The  proofs  of  the  above  results  are  long  and  detailed,  depend¬ 
ing  upon  a  precise  analysis  of  the  properties  of  an  optimal  policy. 

§3-  More  General  Processes. 


Let  us  now  consider  the  more  general  process  corresponding  to 
(1.1).  Here  the  use  of  A  leads  to  a  variety  of  possible  gains,  and 


P-4  36 
-7- 


sinllarly  for  B.  The  queuitltles  x(t)  and  y(t),  aa  defined  by 
(2.1)  are  now  stochastic  quantities.  This  means  that  It  Is  no 
longer  possible  to  obtain  the  equations  of  (2.2).  Instead  we  must 
Introduce  the  function  P(u,v,t)  defined  by  the  property  that 

Pr  ^u  ^  x(t)  i  u  du,  V  ^  y(t)  ^  v  +  dv  -  P(u,v,t)dudv.  (3-1) 

We  may  now.  In  a  way  similar  to  that  followed  In  o2,  derive  a  par¬ 
tial  differential  equation  for  P  of  the  form 

-  P(u,v)  -|^  +  Q(u,v)  -2^  .  (3.2) 

The  system  of  ordinary  differential  equations 

-^-P(u,v),  -  Q(u,v)  (3.3) 

connected  with  (3.2)  will  have  a  form  similar  to  the  first  two  equa¬ 
tions  in  (2.2). 

The  differential  equations  we  have  used  to  define  our  continu¬ 
ous  processes  bear  the  same  relation  to  the  rigorous  Integral  equa¬ 
tions  defined  by  the  original  processes  as  the  heat  equations  bears 
to  the  Chapman-Kolmogoroff  equations. 

Plnally,  let  us  note  that  the  above  formalism  Is  also  appli¬ 
cable  to  two-person  multi-stage  games  of  continuous  type,  and.  In 
particular,  to  pursuit  games. 

These  extensions  will  be  discussed  In  subsequent  comnunlcatldns. 


P-436 

-8- 


BIBLIOGRAPHY 


Bellman,  R.  "On  the  Theory  of  Dynamic  Programming,"  Proc . 

Nat.  Acad.  Scl.,  ^  (1952),  pp.  716-719.  - 

- •  "Some  Functional  Equations  In  the  Theory  of 

Dynamic  Programming,"  Proc.  Nat.  Acad.  Scl.  (to  appear 

- •  "Bottleneck  Problems  and  the  Theory  of  Dynamic 

Programming,"  Proc.  Nat.  Acad.  Scl.  {to  appear). 

- •  "a  Problem  In  the  Theory  of  Dynamic  Programming," 

Econometrics  (to  appear). 

- .  "Computational  Problems  In  the  Theory  of  Dynamic 

Progranmlng , "  Proc.  of  Symposium  on  Numerical  T 
Analysis,  Santa  Monica,  1953. 

Bellman,  R.  and  Blackwell,  D.  "Some  Two-person  Games  Involving 
Bluffing,"  Proc.  Nat.  Acad.  Scl.,  35  (1949).  dd.  6oO- 

605.  — 


